AITopics | natural conversation

Collaborating Authors

natural conversation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Voice-based AI Agents: Filling the Economic Gaps in Digital Health Delivery

Wen, Bo, Wang, Chen, Han, Qiwei, Norel, Raquel, Liu, Julia, Stappenbeck, Thaddeus, Rogers, Jeffrey L.

arXiv.org Artificial IntelligenceJul-28-2025

--The integration of voice-based AI agents in healthcare presents a transformative opportunity to bridge economic and accessibility gaps in digital health delivery. This paper explores the role of large language model (LLM)-powered voice assistants in enhancing preventive care and continuous patient monitoring, particularly in underserved populations. Drawing insights from the development and pilot study of Agent PULSE (Patient Understanding and Liaison Support Engine)--a collaborative initiative between IBM Research, Cleveland Clinic Foundation, and Morehouse School of Medicine--we present an economic model demonstrating how AI agents can provide cost-effective healthcare services where human intervention is economically unfeasible. Our pilot study with 33 inflammatory bowel disease patients revealed that 70% expressed acceptance of AI-driven monitoring, with 37% preferring it over traditional modalities. T echnical challenges, including real-time conversational AI processing, integration with healthcare systems, and privacy compliance, are analyzed alongside policy considerations surrounding regulation, bias mitigation, and patient autonomy. Our findings suggest that AI-driven voice agents not only enhance healthcare scalability and efficiency but also improve patient engagement and accessibility. For healthcare executives, our cost-utility analysis demonstrates huge potential savings for routine monitoring tasks, while technologists can leverage our framework to prioritize improvements yielding the highest patient impact. By addressing current limitations and aligning AI development with ethical and regulatory frameworks, voice-based AI agents can serve as a critical entry point for equitable, sustainable digital healthcare solutions. Healthcare systems worldwide face growing challenges in allocating limited medical resources to meet increasing demand [1], [2]. Traditional healthcare delivery models, centered on episodic patient-provider interactions, often result in significant gaps in continuous care, particularly in preventive health monitoring and chronic disease management [2], [3]. These shortcomings disproportionately affect vulnerable populations, including those with limited access to healthcare facilities [4], lower technological literacy [5], or socio-economic constraints [6]. The advent of Large Language Models (LLMs) and multi-modal AI has opened new avenues for digital health applications [7]-[10], notably in voice-based patient engagement [11], [12]. Unlike earlier rule-based conversational agents, modern AI-driven voice assistants can facilitate context-aware, adaptive, and natural conversations that dynamically adjust to user preferences, health literacy levels, and immediate needs [13]. V oice, as humanity's most intuitive mode of communication, reduces engagement barriers and broadens access to healthcare, especially for underserved communities [12], [14].

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.16229

Country:

North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Portugal (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Government Relations & Public Policy (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Fox News AI Newsletter: Amazing breakthrough for paralyzed man who can't speak

FOX NewsJul-5-2025, 12:30:51 GMT

Thanks to a team at the University of California, Davis, theres a new brain-computer interface (BCI) system thats opening up real-time, natural conversation for people who cant speak. VOICE BREAKTHROUGH: When someone loses the ability to speak because of a neurological condition like ALS, the impact goes far beyond words. Now, thanks to a team at the University of California, Davis, there's a new brain-computer interface (BCI) system that's opening up real-time, natural conversation for people who can't speak. Instead, it translates the brain signals that would normally control the muscles used for speech, allowing users to "talk" and even "sing" through a computer, almost instantly. JOBS ON THE LINE: If you've ordered food on Uber Eats recently, you may have seen a delivery robot instead of a human driver.

amazing breakthrough, brain-computer interface, fox new ai newsletter, (8 more...)

FOX News

Country: North America > United States > California > Yolo County > Davis (0.47)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Therapeutic Area (0.72)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.51)

Add feedback

Paralyzed man speaks and sings with AI brain-computer interface

FOX NewsJul-1-2025, 10:00:02 GMT

When someone loses the ability to speak because of a neurological condition like ALS, the impact goes far beyond words. Now, thanks to a team at the University of California, Davis, there's a new brain-computer interface (BCI) system that's opening up real-time, natural conversation for people who can't speak. Instead, it translates the brain signals that would normally control the muscles used for speech, allowing users to "talk" and even "sing" through a computer, almost instantly. Sign up for my FREE CyberGuy Report Get my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox. Plus, you'll get instant access to my Ultimate Scam Survival Guide - free when you join my CYBERGUY.COM/NEWSLETTER.

artificial intelligence, brain-computer interface, speech, (11 more...)

FOX News

Country: North America > United States > California > Yolo County > Davis (0.25)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.91)

Technology: Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.64)

Add feedback

CASPER: A Large Scale Spontaneous Speech Dataset

Xiao, Cihan, Liang, Ruixing, Zhang, Xiangyu, Tiryaki, Mehmet Emre, Bae, Veronica, Shankar, Lavanya, Yang, Rong, Poon, Ethan, Dupoux, Emmanuel, Khudanpur, Sanjeev, Perera, Leibny Paola Garcia

arXiv.org Artificial IntelligenceJun-12-2025

The majority (67.79%) reported speaking US English, reflecting the dataset's primary demographic. However, a significant proportion of non-native and regionally influenced English varieties are also present, including Chinese Mandarin-influenced English (4.81%), UK English (5.29%), and Indian English (2.88%). Additionally, 14.42% of participants did not specify an accent, indicating either an omission or variability in self-identification. The participants' accent and native language are based on their self-identification, for example, the number of speakers with an Arabic accent may differ from the number with Arabic as their native language. Age distribution reveals that younger speakers are over-represented, with 57.21% of participants in the 18-29 age range and 23.56% in the 30-39 range.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2506.00267

Country: North America > United States (0.16)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation

Neural Information Processing SystemsJun-2-2025, 06:27:07 GMT

Recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech. However, an LLM-based strategy for modeling spoken dialogs remains elusive, calling for further investigation. This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM), designed to generate coherent spoken responses with naturally occurring prosodic features relevant to the given input speech without relying on explicit automatic speech recognition (ASR) or text-to-speech (TTS) systems. We have verified the inclusion of prosody in speech tokens that predominantly contain semantic information and have used this foundation to construct a prosody-infused speech-text model. Additionally, we propose a generalized speech-text pretraining scheme that enhances the capture of cross-modal semantics.

artificial intelligence, large language model, natural language, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SER_AMPEL: a multi-source dataset for speech emotion recognition of Italian older adults

Grossi, Alessandra, Gasparini, Francesca

arXiv.org Artificial IntelligenceDec-14-2023

In this paper, SER_AMPEL, a multi-source dataset for speech emotion recognition (SER) is presented. The peculiarity of the dataset is that it is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults. The dataset is collected following different protocols, in particular considering acted conversations, extracted from movies and TV series, and recording natural conversations where the emotions are elicited by proper questions. The evidence of the need for such a dataset emerges from the analysis of the state of the art. Preliminary considerations on the critical issues of SER are reported analyzing the classification results on a subset of the proposed dataset.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2311.14483

Country:

North America > United States (0.14)
Europe > Italy > Lombardy > Milan (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Personal > Interview (0.94)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

Kids will soon be able to have natural conversations with Alexa

EngadgetSep-20-2023, 16:05:07 GMT

Amazon used its annual hardware event on Wednesday to go all-in on Alexa's new large language model-infused capabilities, touting how easy it'll soon be to have a natural sounding conversation with the bot. This also extends to kids, as the company just announced Explore With Alexa. This is a pared-down and kid-friendly version of the updated chatbot that specializes in topics like animals and nature. It'll even play trivia games with your tykes and disperse daily fun facts. Of course, this is for kids, so the tech has been developed with guardrails to protect them from the more sinister parts of the Internet.

alexa, artificial intelligence, natural language, (6 more...)

Engadget

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.38)

Add feedback

ChatSonic - Like ChatGPT but with superpowers

#artificialintelligenceJan-31-2023, 04:55:11 GMT

ChatGPT is an open-source conversational AI system created by OpenAI, founded by Sam Altman. It is powered by a neural network that has been trained on millions of conversations. It is designed to understand natural language and respond in a meaningful way. The system is based on the GPT-3 model, which is a large-scale language model developed by OpenAI that has been trained on hundreds of billions of words from the internet. The model is used to generate text responses to user input in a conversational manner.

large language model, machine learning, natural language, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

Add feedback

Turn-Taking Prediction for Natural Conversational Speech

Chang, Shuo-yiin, Li, Bo, Sainath, Tara N., Zhang, Chao, Strohman, Trevor, Liang, Qiao, He, Yanzhang

arXiv.org Artificial IntelligenceAug-28-2022

While a streaming voice assistant system has been used in many applications, this system typically focuses on unnatural, one-shot interactions assuming input from a single voice query without hesitation or disfluency. However, a common conversational utterance often involves multiple queries with turn-taking, in addition to disfluencies. These disfluencies include pausing to think, hesitations, word lengthening, filled pauses and repeated phrases. This makes doing speech recognition with conversational speech, including one with multiple queries, a challenging task. To better model the conversational interaction, it is critical to discriminate disfluencies and end of query in order to allow the user to hold the floor for disfluencies while having the system respond as quickly as possible when the user has finished speaking. In this paper, we present a turntaking predictor built on top of the end-to-end (E2E) speech recognizer. Our best system is obtained by jointly optimizing for ASR task and detecting when the user is paused to think or finished speaking. The proposed approach demonstrates over 97% recall rate and 85% precision rate on predicting true turn-taking with only 100 ms latency on a test set designed with 4 types of disfluencies inserted in conversational utterances.

detector, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2208.13321

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AWS Touts Partners' Conversational AI Solutions

#artificialintelligenceNov-16-2021, 22:15:50 GMT

Amazon Web Services is putting the focus on partners' conversational artificial intelligence (CAI) solutions that could spell the end of organizations' customers screaming "representative" to an interactive voice response phone system or getting stuck in a dead-end or circular digital chat loop. AWS is highlighting solutions from consulting partners including Cation Consulting, Deloitte Consulting, Quantiphi and TensorIoT and technology partners including NLX, ServisBOT and XAPP AI that allow organizations to deploy chatbots, virtual assistants and interactive voice response systems that incorporate AWS artificial intelligence and machine learning services. Their solutions employ services including Amazon Kendra, a machine learning-powered search tool that allows users to search unstructured text using natural language; Amazon Lex, a service for building conversational interfaces into applications using voice and text; and Amazon Polly, a text-to-speech service that converts text into lifelike speech. The new partner initiative comes as the demand for CAI interfaces continues to grow, according to Arte Merritt, who leads AWS partnerships for contact center intelligence and conversational AI. End-customers increasingly prefer to interact with businesses on digital channels, and businesses want to increase user satisfaction, reduce operational costs and streamline business processes, Merritt said in a blog post.

artificial intelligence, chatbot, natural language, (14 more...)

#artificialintelligence

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.56)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback